AITopics | pruning metric

Collaborating Authors

pruning metric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

7b75da9b61eda40fa35453ee5d077df6-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 02:39:54 GMT

data pruning, dataset, pruning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
North America > Dominican Republic (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(2 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Beyond neural scaling laws: beating power law scaling via data pruning

Neural Information Processing SystemsDec-24-2025, 13:45:48 GMT

Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scaling and potentially even reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on ResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.

beating power law, data pruning, neural, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

7b75da9b61eda40fa35453ee5d077df6-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 06:37:17 GMT

machine learning, natural language, pruning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
North America > Dominican Republic (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
(2 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression

Zhang, Te, Li, Yuheng, Wang, Junxiang, Li, Lujun

arXiv.org Artificial IntelligenceJul-29-2025

Large multimodal models (LMMs) have advanced significantly by integrating visual encoders with extensive language models, enabling robust reasoning capabilities. However, compressing LMMs for deployment on edge devices remains a critical challenge. In this work, we propose an adaptive search algorithm that optimizes sparsity and KV cache compression to enhance LMM efficiency. Utilizing the Tree-structured Parzen Estimator, our method dynamically adjusts pruning ratios and KV cache quantization bandwidth across different LMM layers, using model performance as the optimization objective. This approach uniquely combines pruning with key-value cache quantization and incorporates a fast pruning technique that eliminates the need for additional fine-tuning or weight adjustments, achieving efficient compression without compromising accuracy. Comprehensive evaluations on benchmark datasets, including LLaVA-1.5 7B and 13B, demonstrate our method superiority over state-of-the-art techniques such as SparseGPT and Wanda across various compression levels. Notably, our framework automatic allocation of KV cache compression resources sets a new standard in LMM optimization, delivering memory efficiency without sacrificing much performance.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.20613

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Prune&Comp: Free Lunch for Layer-Pruned LLMs via Iterative Pruning with Magnitude Compensation

Chen, Xinrui, Zhang, Hongxing, Zeng, Fanyi, Wei, Yongxian, Wang, Yizhi, Ling, Xitong, Li, Guanghao, Yuan, Chun

arXiv.org Artificial IntelligenceJul-25-2025

Layer pruning has emerged as a promising technique for compressing large language models (LLMs) while achieving acceleration proportional to the pruning ratio. In this work, we identify that removing any layer induces a significant magnitude gap in hidden states, resulting in substantial performance degradation. To address this issue, we propose Prune&Comp, a novel plug-and-play layer pruning scheme that leverages magnitude compensation to mitigate such gaps in a training-free manner. Specifically, we first estimate the magnitude gap caused by layer removal and then eliminate this gap by rescaling the remaining weights offline, with zero runtime overhead incurred. We further demonstrate the advantages of Prune&Comp through an iterative pruning strategy. When integrated with an iterative prune-and-compensate loop, Prune&Comp consistently enhances existing layer pruning metrics. For instance, when 5 layers of LLaMA-3-8B are pruned using the prevalent block influence metric, Prune&Comp nearly halves the perplexity and retains 93.19\% of the original model's question-answering performance, outperforming the baseline by 4.01%.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.18212

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning

Mi, Zhendong, Kong, Zhenglun, Yuan, Geng, Huang, Shaoyi

arXiv.org Artificial IntelligenceMay-29-2025

With the rapid expansion of large language models (LLMs), the demand for memory and computational resources has grown significantly. Recent advances in LLM pruning aim to reduce the size and computational cost of these models. However, existing methods often suffer from either suboptimal pruning performance or low time efficiency during the pruning process. In this work, we propose an efficient and effective pruning method that simultaneously achieves high pruning performance and fast pruning speed with improved calibration efficiency. Our approach introduces two key innovations: (1) An activation cosine similarity loss-guided pruning metric, which considers the angular deviation of the output activation between the dense and pruned models. (2) An activation variance-guided pruning metric, which helps preserve semantic distinctions in output activations after pruning, enabling effective pruning with shorter input sequences. These two components can be readily combined to enhance LLM pruning in both accuracy and efficiency. Experimental results show that our method achieves up to an 18% reduction in perplexity and up to 63% decrease in pruning time on prevalent LLMs such as LLaMA, LLaMA-2, and OPT.

large language model, machine learning, pruning, (18 more...)

arXiv.org Artificial Intelligence

2505.21987

Country: North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

STADE: Standard Deviation as a Pruning Metric

Mecke, Diego Coello de Portugal, Alyoussef, Haya, Koloiarov, Ilia, Stubbemann, Maximilian, Schmidt-Thieme, Lars

arXiv.org Artificial IntelligenceMar-28-2025

Recently, Large Language Models (LLMs) have become very widespread and are used to solve a wide variety of tasks. To successfully handle these tasks, LLMs require longer training times and larger model sizes. This makes LLMs ideal candidates for pruning methods that reduce computational demands while maintaining performance. Previous methods require a retraining phase after pruning to maintain the original model's performance. However, state-of-the-art pruning methods, such as Wanda, prune the model without retraining, making the pruning process faster and more efficient. Building upon Wanda's work, this study provides a theoretical explanation of why the method is effective and leverages these insights to enhance the pruning process. Specifically, a theoretical analysis of the pruning problem reveals a common scenario in Machine Learning where Wanda is the optimal pruning method. Furthermore, this analysis is extended to cases where Wanda is no longer optimal, leading to the development of a new method, STADE, based on the standard deviation of the input. From a theoretical standpoint, STADE demonstrates better generality across different scenarios. Finally, extensive experiments on Llama and Open Pre-trained Transformers (OPT) models validate these theoretical findings, showing that depending on the training conditions, Wanda's optimal performance varies as predicted by the theoretical framework. These insights contribute to a more robust understanding of pruning strategies and their practical implications. Code is available at: https://github.com/Coello-dev/STADE/

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.22451

Country:

Europe > Germany > Lower Saxony (0.05)
North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
Europe > Portugal (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs

Gao, Chang, Zhao, Kang, Chen, Jianfei, Jing, Liping

arXiv.org Artificial IntelligenceMar-24-2025

Large language models (LLMs) have demonstrated impressive capabilities, but their enormous size poses significant challenges for deployment in real-world applications. To address this issue, researchers have sought to apply network pruning techniques to LLMs. A critical challenge in pruning is allocation the sparsity for each layer. Recent sparsity allocation methods is often based on heuristics or search that can easily lead to suboptimal performance. In this paper, we conducted an extensive investigation into various LLMs and revealed three significant discoveries: (1) the layerwise pruning sensitivity (LPS) of LLMs is highly non-uniform, (2) the choice of pruning metric affects LPS, and (3) the performance of a sparse model is related to the uniformity of its layerwise redundancy level. Based on these observations, we propose that the layerwise sparsity of LLMs should adhere to three principles: \emph{non-uniformity}, \emph{pruning metric dependency}, and \emph{uniform layerwise redundancy level} in the pruned model. To this end, we proposed Maximum Redundancy Pruning (MRP), an iterative pruning algorithm that prunes in the most redundant layers (\emph{i.e.}, those with the highest non-outlier ratio) at each iteration. The achieved layerwise sparsity aligns with the outlined principles. We conducted extensive experiments on publicly available LLMs, including the LLaMA2 and OPT, across various benchmarks. Experimental results validate the effectiveness of MRP, demonstrating its superiority over previous methods.

large language model, machine learning, sparsity, (18 more...)

arXiv.org Artificial Intelligence

2503.18377

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

OPTISHEAR: Towards Efficient and Adaptive Pruning of Large Language Models via Evolutionary Optimization

Liu, Shuqi, He, Bowei, Wu, Han, Song, Linqi

arXiv.org Artificial IntelligenceFeb-15-2025

Post-training pruning has emerged as a crucial optimization technique as large language models (LLMs) continue to grow rapidly. However, the significant variations in weight distributions across different LLMs make fixed pruning strategies inadequate for multiple models. In this paper, we introduce \textbf{\textsc{OptiShear}}, an efficient evolutionary optimization framework for adaptive LLM pruning. Our framework features two key innovations: an effective search space built on our Meta pruning metric to handle diverse weight distributions, and a model-wise reconstruction error for rapid evaluation during search trials. We employ Non-dominated Sorting Genetic Algorithm III (NSGA-III) to optimize both pruning metrics and layerwise sparsity ratios. Through extensive evaluation on LLaMA-1/2/3 and Mistral models (7B-70B) across multiple benchmarks, we demonstrate that our adaptive pruning metrics consistently outperform existing methods. Additionally, our discovered layerwise sparsity ratios enhance the effectiveness of other pruning metrics. The framework exhibits strong cross-task and cross-model generalizability, providing a cost-effective solution for model compression.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.10735

Genre: Research Report (1.00)

Technology: